Speech Recognition with Dynamic Bayesian

نویسنده

  • Stuart J. Russell
چکیده

Dynamic Bayesian networks (DBNs) are a powerful and exible methodology for representing and computing with probabilistic models of stochastic processes. In the past decade, there has been increasing interest in applying them to practical problems, and this thesis shows that they can be used eeectively in the eld of automatic speech recognition. A principle characteristic of dynamic Bayesian networks is that they can model an arbitrary set of variables as they evolve over time. Moreover, an arbitrary set of conditional independence assumptions can be speciied, and this allows the joint distribution to be represented in a highly factored way. Factorization allows for models with relatively few parameters, and computational eeciency. Standardized inference and learning routines allow a wide variety of probabilistic models to be tested without deriving new formulae, or writing new code. The contribution of this thesis is to show how DBNs can be used in automatic speech recognition. This involves solving problems related to both representation and inference. Representationally, the thesis shows how to encode stochastic nite-state word models as DBNs, and how to construct DBNs that explicitly model the speech-articulators, accent, gender, speaking-rate, and other important phenomena. Technically, the thesis presents inference routines that are especially tailored to the requirements of speech recognition: eecient inference with deterministic constraints, variable-length utterances, and online inference. Finally, the thesis presents experimental results that indicate that real systems can be built, and that modeling important phenomena with DBNs results in improved recognition accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Hidden factor dynamic Bayesian networks for speech recognition

This paper presents a novel approach to modeling speech data by Dynamic Bayesian Networks. Instead of defining a specific set of factors that affect speech signals the factors are modeled implicitly by speech data clustering. Different data clusters correspond to different subsets of the factor values. These subsets are represented by the corresponding factor states. The factor states along wit...

متن کامل

Robot Arm Performing Writing through Speech Recognition Using Dynamic Time Warping Algorithm

This paper aims to develop a writing robot by recognizing the speech signal from the user. The robot arm constructed mainly for the disabled people who can’t perform writing on their own. Here, dynamic time warping (DTW) algorithm is used to recognize the speech signal from the user. The action performed by the robot arm in the environment is done by reducing the redundancy which frequently fac...

متن کامل

Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Audio‐visual speech recognition is a natural and robust approach to improving human‐robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete s...

متن کامل

Articulatory feature recognition using dynamic Bayesian networks

We describe a dynamic Bayesian network for articulatory feature recognition. The model is intended to be a component of a speech recognizer that avoids the problems of conventional “beads-on-a-string” phoneme-based models. We demonstrate that the model gives superior recognition of articulatory features from the speech signal compared with a stateof-the art neural network system. We also introd...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998